aws: support for bring-your-own hosted zone#4772
aws: support for bring-your-own hosted zone#4772openshift-merge-robot merged 4 commits intoopenshift:masterfrom
Conversation
24a81c2 to
6676aa8
Compare
|
/test e2e-aws-shared-vpc |
6676aa8 to
5659a4f
Compare
5659a4f to
34aef82
Compare
|
5659a4fc1...34aef8216
|
34aef82 to
fcf44b7
Compare
patrickdillon
left a comment
There was a problem hiding this comment.
I have reviewed the first three commits and will take a closer look at the destroy code soon.
There was a problem hiding this comment.
I don't get why we continue here. It seems like it is ok to have a record equal to the cluster domain. Is that the case?
There was a problem hiding this comment.
Yes, every hosted zone has two records which are NS and SOA records with the same domain as the hosted zone. I will add a comment about this.
There was a problem hiding this comment.
Is invalid the right type of error here? Perhaps InternalError would be more suitable?
There was a problem hiding this comment.
Yep, that makes sense.
There was a problem hiding this comment.
nit: comment does not match function name
patrickdillon
left a comment
There was a problem hiding this comment.
This looks mostly sane to me with a few nits and a couple of questions, especially regarding the trailing dot. But it looks good. I may try to take another closer look as I don't have a lot of background in AWS destroy.
There was a problem hiding this comment.
nit: I'm not sure + in the verb %#+v is doing anything. From fmt:
%v the value in a default format
when printing structs, the plus flag (%+v) adds field names
%#v a Go-syntax representation of the value
There was a problem hiding this comment.
I agree. This was copied from
installer/pkg/destroy/aws/aws.go
Line 1880 in 6d778f9
There was a problem hiding this comment.
nit (or maybe more of a rant): I realize (now) this code is not original to this PR, but this seems like a convoluted way of doing:
key := "kubernetes.io/cluster/" + clusterID
o.removeSharedTag(ctx, session, tagClients, key, tracker);`
Perhaps this is more future proof. Let's not fix what's not broken, just ranting or maybe I'm missing something.
There was a problem hiding this comment.
Yes, this is a bit of code that always takes me too long to understand every time that I look at it. I don't know that we want to rely on the clusterID being set correctly, although I can't think of a good reason why it wouldn't be.
Maybe it would be sufficient to have a function that gets the cluster tag keys from the filter.
func clusterOwnedKeys(filters []Filter) []string {
var keys[] string
for _, filter := range filters {
for key, value := range filter {
if !strings.HasPrefix(key, "kubernetes.io/cluster/") {
continue
}
if value != "owned" {
continue
}
keys = append(keys, key)
}
}
return keys
}
And then the removeSharedTags function could be the following.
func removeSharedTags(ctx context.Context, tagClients []*resourcegroupstaggingapi.ResourceGroupsTaggingAPI, filters []Filter, logger logrus.FieldLogger) error {
for _, key := range clusterOwnedKeys(filter) {
if err := removeSharedTag(ctx, tagClients, key, logger); err != nil {
return err
}
}
return nil
}
There was a problem hiding this comment.
Should this be capitalized: nothing -> Nothing?
Or is that not necessary because it is WithField?
There was a problem hiding this comment.
It should be capitalized.
There was a problem hiding this comment.
Same question about caps: no -> No
There was a problem hiding this comment.
Is the name of the record set dotted or is it the the value of the record that is dotted? When I look in the GUI it looks like the name isn't dotted but the value is.
There was a problem hiding this comment.
The name of the record set is dotted.
For example,
{
"Name": "api-int.ewolinetz3.devcluster.openshift.com.",
"Type": "A",
"AliasTarget": {
"HostedZoneId": "ZLMOA37VPKANP",
"DNSName": "ewolinetz3-rpcfk-int-42decb5de56b9b0b.elb.us-east-2.amazonaws.com.",
"EvaluateTargetHealth": false
}
},
There was a problem hiding this comment.
same comment as above about %#+v verb
There was a problem hiding this comment.
same comment as above about %#+v verb
There was a problem hiding this comment.
Should we add in the error message for the public record that we skipped deleting the private record?
There was a problem hiding this comment.
No. The destroyer should try again later to delete the records.
Add the `.aws.hostedZone` field to the install config to support the user supplying an existing hosted zone for the internal private hosted zone for the cluster. This can only be used when the user is also supplying their own VPC. The hosted zone must already be associated with the user-provided VPC. https://issues.redhat.com/browse/CORS-1666
When the user provides an existing hosted zone to use for the cluster's private zone, the dns.config.openshift.io resource needs to be adjusted so that the operator can locate the hosted zone. Since the hosted zone already exists, we can specify the ID of the hosted zone rather than relying on tags.
Add validation in the "Platform Provisioning Check" asset for the user-provided internal hosted zone. The validation checks that (1) the hosted zone is associated with the user-provided VPC and that (2) the hosted zone does not contain any record sets for subdomains of the cluster's domain. The latter of these checks is meant to provide a modicum of protection against the user accidentally trying to install again a cluster that they have already installed. When it comes time to destroy the second failed installation, the destroyer will not be able to tell that the records sets in the hosted zone are actually being used by a different cluster.
fcf44b7 to
9bc4190
Compare
|
@patrickdillon I have addressed your feedback. |
When the user provides the private hosted zone to use for the cluster, the destroyer still needs to delete the recordsets for the cluster when the cluster is destroyed. There is no way to tag recordsets, so we must rely on tagging the hosted zone as shared. When the destroyer encounters a hosted zone tagged as shared by the cluster, the destroyer will delete all recordsets in that hosted zone that are strict subdomains of the cluster's domain. The cluster domain is added to the AWS cluster metadata so that it is available to the destroyer.
9bc4190 to
cf18b17
Compare
|
I'm having a hard time reasoning through this destroy code in the context of avoiding deleting non-cluster records. I may also be hitting some limits in my knowledge. I need to revisit the code & will do that immediately but we're in a bit of a time crunch so I want to write this out to aid the process/discussion. What about this scenario (I think this would pass pre-provision check, but let me know if something else makes this invalid): We have a shared hosted zone: We install a cluster with the cluster domain We install a second cluster with the cluster domain We delete the first cluster with cluster domain If so, I am thinking we should lean toward deleting whitelisted records in this shared scenario (e.g. only delete |
|
/lgtm |
|
For posterity, since this was discussed outside of this PR,
Yes that will delete the records for the second cluster. The contention here is that it is a mis-configuration to have the second cluster use a cluster domain that is a sub domain of the first cluster. The first cluster owns its entire cluster domain. We cannot check for this on pre-provision because there is no way to (reliably) tell if a cluster with a parent domain is in the hosted zone already.
A whitelist is problematic because record sets can be created by in-cluster components with names that the installer does not know about. The installer only creates the api and api-int record sets. For example, the *.apps record set is created in-cluster. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: staebler The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
7 similar comments
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
/retest Please review the full test history for this PR and help us cut down flakes. |
|
@staebler: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/cherry-pick release-4.7 |
|
@staebler: #4772 failed to apply on top of branch "release-4.7": DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Add the
.aws.hostedZonefield to the install config to support the user supplying an existing hosted zone for the internal private hosted zone for the cluster. This can only be used when the user is also supplying their own VPC. The hosted zone must already be associated with the user-provided VPC.Add validation in the "Platform Provisioning Check" asset for the user-provided internal hosted zone. The validation checks that
(1) the hosted zone is associated with the user-provided VPC and that (2) the hosted zone does not contain any record sets for subdomains of the cluster's domain.
The latter of these checks is meant to provide a modicum of protection against the user accidentally trying to install again a cluster that they have already installed. When it comes time to destroy the second failed installation, the destroyer will not be able to tell that the records sets in the hosted zone are actually being used by a different cluster.
When the user provides the private hosted zone to use for the cluster, the destroyer still needs to delete the recordsets for the cluster when the cluster is destroyed. There is no way to tag recordsets, so we must rely on tagging the hosted zone as shared. When the destroyer encounters a hosted zone tagged as shared by the cluster, the destroyer will delete all recordsets in that hosted zone that are strict subdomains of the cluster's domain. The cluster domain is added to the AWS cluster metadata so that it is available to the destroyer.
https://issues.redhat.com/browse/CORS-1666